-
Notifications
You must be signed in to change notification settings - Fork 5.1k
http: new body transform filter with the substitution formatter #41612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: wbpcode <[email protected]>
… dev-body-transform
Signed-off-by: wbpcode <[email protected]>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
|
cc @arkodg cc @agrawroh for you guys may care about this. cc @botengyao for this may helpful for AI traffic routing. |
Signed-off-by: WangBaiping <[email protected]>
Signed-off-by: WangBaiping <[email protected]>
adisuissa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Left a few API comments
| // to the existing body. If not set, the result of ``body_format_string`` will replace | ||
| // the existing body. | ||
| // True by default. | ||
| google.protobuf.BoolValue patch_format_string = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is highly coupled with body_format_string and that is optional, would it make sense to move both to an internal message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| // And except the commonly used format specifiers, there are some additional format specifiers | ||
| // provided by the transform filter: | ||
| // | ||
| // * ``%RQ_BODY(X)%``: the request body. And ``X`` is the JSON pointer to extract a specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require a JSON-body? I may have missed it, but this seems to be a generic transform, and doesn't require JSON-body to be sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON body is required. We have documented it that only JSON body is supported for now.
| // * ``%RQ_BODY(X)%``: the request body. And ``X`` is the JSON pointer to extract a specific | ||
| // field from the JSON body. If ``X`` is empty, the whole body will be used. | ||
| // * ``%RS_BODY(X)%``: the response body. And ``X`` is the JSON pointer to extract a specific | ||
| // field from the JSON body. If ``X`` is empty, the whole body will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole body will be used
FWIW this might cause the headermap to go over the limit, and may be an attack vector, but as this filter doesn't have a security posture, it might be ok.
Also it would be good to clarify whether using this filter implies that all requests/responses are buffered, even if headers_mutations are defined without referencing RQ/RS_BODY. If there's a need to support both buffering and streaming modes, consider adding an enum field specifying the mode (and maybe only support one for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At next few weeks, I will try to support the event stream for the response. We will handle it as stream for that. But for JSON request/response, buffering always be used.
|
|
||
| // Body transformation configuration. If not set, no body transformation will be performed. | ||
| // NOTE the ``RQ_BODY`` and ``RS_BODY`` format specifiers can also be used here. | ||
| config.core.v3.SubstitutionFormatString body_format_string = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts on using JSONPatch semantics here ?
https://jsonpatch.com
its used in Envoy Gateway API to edit xDS https://gateway.envoyproxy.io/docs/tasks/extensibility/envoy-patch-policy/#customize-response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. But it's will have big effect to the current implementation. I will update the API to keep the patch mode as enum and then we can support it in the future.
Co-authored-by: Adi (Suissa) Peleg <[email protected]> Signed-off-by: code <[email protected]>
Co-authored-by: Adi (Suissa) Peleg <[email protected]> Signed-off-by: code <[email protected]>
Co-authored-by: Adi (Suissa) Peleg <[email protected]> Signed-off-by: code <[email protected]>
Signed-off-by: WangBaiping <[email protected]>
botengyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding this filter! have one question out of curiosity.
| if (end_stream) { | ||
| decoder_callbacks_->addDecodedData(data, true); | ||
| handleCompleteRequestBody(); | ||
| return Http::FilterDataStatus::Continue; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related but not tied / blocker to this PR, do you have some insights for the streaming JSON parser support without waiting for the whole body once we get the enough data. I think the current nlohmann lib doesn't quite support it, and we could need some SAX style approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rapidJson could help? But we removed that because suggestions from google. 🤣
Signed-off-by: WangBaiping <[email protected]>
Signed-off-by: WangBaiping <[email protected]>
|
friendly ping for a new review cc @adisuissa |
|
/retest |
| // Only JSON body transformation is supported for now. | ||
| message TransformConfig { | ||
| // Configuration for transforming request. If set then the request will be buffered | ||
| // until the entire body is received and transformed for JSON request bodies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this needs to also address HTTP trailers case?
(I'm not saying to add it now, but there may need to be a config knob in the future that decides what needs to be buffered before sending it down/up the stream. I believe this may be in the "Transformation" message).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM and make sense. Let's the future requirement/issue to tell us!
|
|
||
| // Configuration for the transform filter. The request and response transformations are | ||
| // independent and could be configured separately. | ||
| // Only JSON body transformation is supported for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there should just be a JsonTransform filter here, instead of something generic. The reason I bring this up is because the transformations seem to be highly coupled with JSON bodies. (this is a discussion and not a a call for action).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually depends the feedback from the users and community, I personally hope this filter could handle various content type like plain text or url form by extend other fields in this filter.
JSON is supported first is because that the JSON is used widely for now in open API and AI related things. In addition, for the response, we need to support event stream to support AI traffic.
| // lookup key in the namespace with the option of specifying nested keys separated by ':'. | ||
| // * ``%RESPONSE_BODY(KEY*)%``: the response body. And ``Key`` KEY is an optional | ||
| // lookup key in the namespace with the option of specifying nested keys separated by ':'. | ||
| repeated config.common.mutation_rules.v3.HeaderMutation headers_mutations = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the headers be blocked until the body is buffered even if headers_mutations isn't configured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, yes for simplification of the code. But this is good point that could be optimized.
Hmmm, it will require the decodeData to handle the continue or stop of decodeHeaders correctly. Seems it's not deserved to optimize because the additional complexity 🤔 The filter will not be performant anyway because the body processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is an implementation issue, but I think it should be spelled clearly in a note above (example: buffering of the entire request headers and body will always happen on a request even if only headers are transformed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
|
|
||
| // Body transformation configuration. If not set, no body transformation will be performed. | ||
| // NOTE the ``REQUEST_BODY`` and ``RESPONSE_BODY`` format specifiers can also be used here. | ||
| config.core.v3.SubstitutionFormatString body_format = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused here. This field type is SubstitutionFormatString which by the name suggests that there will be a substitution. However, the action can either be MERGE or REPLACE. Is the MERGE essentially a substitute? If so, what is REPLACE in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Substitution in SubstitutionFormatString is means the substitution formatter will be used in the format string. For example, in the final generated output, %REQ(xxx)% in the format string will be substituted by the actual header value.
And the action is used to indicate how to handle the final generated output of the body_format and the original body content.
I will add some more comment to make it more clear.
| repeated config.common.mutation_rules.v3.HeaderMutation headers_mutations = 1; | ||
|
|
||
| // The body transformation configuration. If not set, no body transformation will be performed. | ||
| BodyTransformation body_transformation = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see a case for multiple body transformations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. This is a one time logic. We will generate a new body based on the body_format and then the new body will replace or be merge into original body content.
If multiple phase transformations is necessary, the users could configure multiple transform filters in the filter chain anyway. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the same can be said about headers, but I agree that headers are more likely to have multiple transformations (one for each header key).
Co-authored-by: Adi (Suissa) Peleg <[email protected]> Signed-off-by: code <[email protected]>
Co-authored-by: Adi (Suissa) Peleg <[email protected]> Signed-off-by: code <[email protected]>
Co-authored-by: Adi (Suissa) Peleg <[email protected]> Signed-off-by: code <[email protected]>
Co-authored-by: Adi (Suissa) Peleg <[email protected]> Signed-off-by: code <[email protected]>
Signed-off-by: WangBaiping <[email protected]>
…into dev-body-transform
|
/retest |
adisuissa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall API LGTM.
Left a suggestion to clarify the implication of using this filter (buffering) in a note (for Envoy at least).
| repeated config.common.mutation_rules.v3.HeaderMutation headers_mutations = 1; | ||
|
|
||
| // The body transformation configuration. If not set, no body transformation will be performed. | ||
| BodyTransformation body_transformation = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the same can be said about headers, but I agree that headers are more likely to have multiple transformations (one for each header key).
| // lookup key in the namespace with the option of specifying nested keys separated by ':'. | ||
| // * ``%RESPONSE_BODY(KEY*)%``: the response body. And ``Key`` KEY is an optional | ||
| // lookup key in the namespace with the option of specifying nested keys separated by ':'. | ||
| repeated config.common.mutation_rules.v3.HeaderMutation headers_mutations = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is an implementation issue, but I think it should be spelled clearly in a note above (example: buffering of the entire request headers and body will always happen on a request even if only headers are transformed).
Signed-off-by: WangBaiping <[email protected]>
adisuissa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm api
|
/retest |
Signed-off-by: WangBaiping <[email protected]>
|
We added a new type alias in this PR's previous commit. But another alias for same type was added in the main by recently merged PR. So, after merged the main, at the latest commit, we removed the unnecessary new alias and reuse the another alias. |
|
gently ping for a review @yanavlasov |
|
/retest |
Commit Message: http: new body transform filter with the substitution formatter
Additional Description:
To close #35783. There is enough context in the #35783. In short sentence, this PR add a transform filter to modify request or response headers by the substitution formatter API.
It's also possible to add body attributes to headers and refresh the routes.
Risk Level: low. New HTTP filter.
Testing: unit/integration.
Docs Changes: added.
Release Notes: added.
Platform Specific Features: n/a.